Students names and IDs:
Vincent Belpaire (02009131),
Takuto Yamaguchi (02200943),
Elise Taragola (02008750)
Academic Year : 2022-2023
%matplotlib inline
'''
Use %matplotlib inline instead of %matplotlib widget when preparing the final report.
Otherwise the images are, unfortunately, not embedded in the pdf.
'''
from importstatements import *
from pylab import *
from scipy.signal import firwin, lfilter, filtfilt
from scipy.io import loadmat
import scipy as sci
import scipy.io
import scipy.signal as sig
from scipy import stats
import statsmodels
css_styling()
$ \newcommand{\fsamp}{{F_s}} \newcommand{\tsamp}{{T_s}} \newcommand{\unit}[1]{\ensuremath{\text{#1}}} \newcommand{\tmidx}{n} \newcommand{\tds}[2][\tmidx]{{#2(#1)}} \newcommand{\matcmd}[2][(\cdot)]{\texttt{#2}#1} \renewcommand{\matcmd}[2][(\cdot)]{\mathrm{#2}#1} \newcommand{\fcn}[1]{{\text{#1}}} \newcommand{\bigcb}[1]{{\big\{#1\big\}}} \newcommand{\Bigcb}[1]{{\Big\{#1\Big\}}} \newcommand{\bigsb}[1]{{\big[#1\big]}} \newcommand{\Bigsb}[1]{{\Big[#1\Big]}} \newcommand{\biggsb}[1]{{\bigg[#1\bigg]}} \newcommand{\pyt}{\matcmd[]{Python}} \newcommand{\fds}[2][z]{#2(#1)} \newcommand{\zpow}[1][-1]{z^{#1}} \newcommand{\cbr}[1]{\big\{#1\big\}} \newcommand{\mat}[1]{\boldsymbol{#1}} $
In this exercise, we will further explore bootstrapping techniques to evaluate the quality of correlation/linear regression analysis on collected datapoints, as well as perform hypothesis testing on event-related brain signals. Both of these techniques will come handy in your future biomedical data-science projects. You will also be pointed to tools/packages which can come in handy to visualize or better understand your collected data.
When we have a data-array of two variables, it is possible to make a scatter-plot of the data (i.e. plot the x variables against the y-variables), and study whether the variables are linearly related. Using a correlation analysis (sci.stats.pearsonr or sci.stats.spearmanr) we can get an estimate of the strength of the relationship between the two variables, and if we apply a linear regression model (sci.stats.linregress), we can find the slope and intercept parameters of the best linear fit: y = slope * x + intercept to the data.
Regression models are often applied to collected data-sets to infer whether one variable predicts the other, or whether there is a causal relationship between two variables. In this respect it is important to consider the certainty with which you can draw your conclusion from the collected data, and a leave-one-out bootstrap procedure can come in handy here. First, have a look at plotting the following data-set:
DPTH = np.array([35.84, 20.09, 22.34, 31.24, 16.20, 29.91, 24.79, 15.91, 7.40, 27.05, 33.80, 26.71, 21.96, 33.36, 22.89,
23.66, 33.42, 35.66, 27.88, 28.97])
EFR = np.array([0.02309, 0.03251, 0.01280, 0.03868, 0.02843, 0.01743, 0.02855, 0.02433, 0.04578, 0.02599, 0.03936, 0.03773,
0.03184, 0.00411, 0.02412, 0.03997, 0.01557, 0.01502, 0.01815, 0.01199])
EFRstd = np.array([0.00564, 0.00644, 0.00881, 0.00473, 0.0067, 0.00982, 0.00777, 0.00689, 0.00594, 0.00565, 0.01148, 0.00832,
0.00574, 0.01074, 0.01473, 0.00958, 0.01045, 0.00775, 0.01162, 0.00577])
plt.figure()
plt.plot(DPTH,EFR,'o')
#unit of DPTH [dB SPL]
#unit of EFR [\muV]
[<matplotlib.lines.Line2D at 0x7fa300370880>]
Some background about the data you are working with. Both variables are objective measures of hearing function recorded from the same individual i.e. each datapoint resembles two measures taken from the same person.
Envelope-following-responses (EFRs) are auditory evoked potentials which are recorded using an EEG setup in response to sound stimulation with an amplitude-modulated pure-tone. The ensemble auditory-nerve and brainstem neurons are very capable of tracking the stimulus envelope, and the EEG recording will hence show a peak at the auditory envelope frequency. Animal studies have demonstrated that the strenght of this peak in an individual might inform about the total number of auditory-nerve fibers an individual has available. When there is hearing damage, this number will be lower.
The second variable is the distortion-product otoacoustic emission threshold (DPOAE TH). This is a marker of hearing which is often used in clinical practice to assess whether the outer-hair-cells are intact. The lower the DPOAE threshold (the minimal sound level at which a measureable DPOAE is recorded), the more sensitive your hearing is. People with damaged outer-hair-cells generally have higher DPOAE thresholds (DPTH).
Given this scatter-plot, you could fit a single regression line and make a conclusion about the causal relationship between the two variables. However, you can clearly see that there is an outlier. If you would leave it in, the relationship between the two variables might be stronger, if you remove it, there might not be a relationship left. You found no other reasons to discard this datapoint (i.e. the data collection quality was good).
Task
To analyse the variability of your regression model, you can apply a leave-one-out bootstrap procedure. In this procedure, you remove a single data-point from the set and calculate the regression statistics as you would do for the complete dataset. This will yield N regression curves, p-values and r-values which that you can use to study the robustness of your linear regression model. That is, you can plot distributions (histograms) of your p/r-values to assess whether the variability is reasonable.
First, apply this method on your dataset (x-variable: DPTH and y-variable: EFR) by running sci.stats.linregress function in a for loop (look up its use online), and plot the regression curves belonging to the individual leave-one-out iterations on top of the scatter-plot. Also include the reference regression model curve that you obtained when including all data-points.
Second, perform the same analysis after dropping the outlier value from the dataset using np.delete.
Compare the robustness of the two regression models to address the following questions: What is the effect of the one aparent outlier on the interpretation of the relationship? Does the dataset allow you to infer that you can use the EFR and DPOAE interchangeably in a hearing screening procedure?
x, y = DPTH, EFR
def leave_one_out_analysis(x, y, labels=['x', 'y'], add_to_title=''):
# reference regression model with all datapoints
reg = sci.stats.linregress(x, y)
# leave-one-out bootstrap method
N = len(x)
slopes, intercepts, rvals, pvals = np.zeros(N), np.zeros(N), np.zeros(N), np.zeros(N)
for i in range(N):
a, b, r, p, stderr = sci.stats.linregress(np.delete(x, i), np.delete(y, i))
slopes[i], intercepts[i], rvals[i], pvals[i] = a, b, r, p
# plotting the regression lines and the histograms for the r- and p-values
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(18,4))
# regressioon lines
ax1.scatter(x, y, color='black')
ax1.set_title("Regression curves" + add_to_title)
for i in range(N):
ax1.plot(x, slopes[i] * x + intercepts[i], color='blue', alpha=0.1)
ax1.plot(x, reg.slope * x + reg.intercept, color='red',
label='Reference Model'+f'\nr-value :{round(reg.rvalue,3)}'+f'\np-value :{round(reg.pvalue,3)}')
ax1.set_xlabel(labels[0])
ax1.set_ylabel(labels[1])
ax1.legend()
# r-values histogram
ax2.hist(rvals, edgecolor='black')
ax2.set_title('Histogram of r-value' + add_to_title)
ax2.set_xlabel('r-value')
ax2.set_ylabel('Occurance')
# p-values histogram
ax3.hist(pvals, edgecolor='black')
ax3.set_title('Histogram of p-value' + add_to_title)
ax3.set_xlabel('p-value')
ax3.set_ylabel('Occurance')
# analysis with outlier
leave_one_out_analysis(x, y, labels=['DPTH', 'EFR'])
# find the outlier: the outlier is the datapoint furthest from the mean center point = (np.mean(x), np.mean(y))
squared_distances = np.array((x-np.mean(x))**2 + (y-np.mean(y)**2))
index_outlier = np.argmax(squared_distances)
# analysis without outlier
leave_one_out_analysis(np.delete(x, index_outlier), np.delete(y, index_outlier), labels=['DPTH', 'EFR'], add_to_title='without outlier')
\tcbset{ frame code={} center title, left=0pt, right=0pt, top=0pt, bottom=0pt, colback=green!50, colframe=white, width=\dimexpr\textwidth\relax, enlarge left by=0mm, boxsep=5pt, arc=0pt,outer arc=0pt, }
The yielded r-value (correlation coefficient) represents the strength of the linear relationship between EFR and DPTH, while the obtained p-value indicates whether the correlation between the two variables is statistically significant. The two regression curves above show a negative correlation between the two variables. However, after removing the outlier (the second method), the r-value increased from -0.446 to -0.271, indicating that the negative strength of the linear relationship between these two values became weaker. Additionally, the p-value increased from 0.049 to 0.263 after the second method, suggesting that in this context, the correlation between the two variables is not statistically significant.
Therefore, the presence of one apparent outlier significantly influences the result. Keeping the outlier (the first method) suggests that it has a strong influence on the correlation, while removing the outlier (the second method) suggests that the correlation may be weaker and not significant without it.
The dataset does not allow to infer that EFR and DPOAE are interchangeable variables in a hearing screening procedure because the r-value is in absolute value too low and when removing the outlier it is even lower. This indicates a very low correlation between the two variables, hence interchangeability can not be concluded.
\end{tcolorbox}
</span>
Here, we will make use of that event-related potentials (EEG to a sensory event) often use multiple repetitions to the same stimulus to improve the signal-to-noise ratio of the data (see Module 6). We will use this information to perform bootstrapping-based hypothesis testing. For the specific example at hand, we have about 5000 repetitions of an acoustic click, to which we recorded the EEG signal (auditory brainstem response, ABR). The resulting signal has a very typical waveform with distict peaks that occur within the first 10 ms after the click onset, and which resemble different processing stages along the auditory pathway. The peak at 1-2 ms (wave-I) reflects the ensemble of auditory-nerve fibers available, whereas the peak at 5-7 ms (wave-V) reflects the brainstem processing of sound (inferior colliculus). The ABR is hence some sort of impulse response of the ear, and is often clinically used to detect peripheral brain lesions, or as a screening tool for neonates. The signal also has information about how the ear processes sound (i.e. for research purposes), and in this example, we would like to assess the following:
Work through the different steps to preprocess your data, start with the 100-dB ABR to go through all preprocessing steps, and then repeat the same steps for the 80-dB ABR. Afterwards, you can compare the EEG response features statistically and formulated a motivated answer to the above questions. First, we will perform the filtering, epoching and averaging such that we can visualise the evoked potentials, calculate their confidence intervals and perform the statistical assessment.
#Load in the data
ABRData80 = scipy.io.loadmat('ABR80dB.mat')
ABRData100 = scipy.io.loadmat('ABR100dB.mat')
#automatic filtering is already applied between 10 and 1500 Hz as part of the recording software,
#but we will at least apply a HP filter with higher cut-off frequency to remove low-frequency and 50-Hz noise.
FS=10000
def load_in_data(data):
#here we do all the analysis for the 100dB stimulus, you should do the same for the 80 dB stimulus.
Sig=data["DATA"] #has the raw recording trace of EEG data (i.e. 5 min recording)
Ch1=Sig[0,:] #is the recording channel we will work with
T=Sig[2,:] #is the channel with equal duration of the recording, but which has the trigger events
#(i.e. sample numbers at which a click was presented)
#find all the triggers
indxP = np.where(T == 20001)[0]
indxN = np.where(T == -20001)[0]
indx = np.concatenate((indxP,indxN),axis=0) #index has the sample numbers at which an event started, and is necessary to epoch the data.
#Get a feeling for your data
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))
ax1.plot(Ch1)
ax1.set_title("Recorded signal")
ax2.plot(T)
ax2.set_title("Trigger signal")
ax2.set_xlim((10000,20000))
#Note that the trigger mark either positive or negative polarities of the stimulus click (0 to 100 dB or -100 to dB).
#This is not relevant here as we will just average both polarities together, but those interested could also repeat
#the analysis for only the positive or negative polarity click stimuli.
return (Sig, Ch1, T, indxP, indxN, indx)
Sig_100, Ch1_100, T_100, indxP_100, indxN_100, indx_100 = load_in_data(ABRData100)
Sig_80, Ch1_80, T_80, indxP_80, indxN_80, indx_80 = load_in_data(ABRData80)
Task
sig.butter) to your data (Ch1) with cut-off frequencies between 100 and 1500 Hz. Use a zero-phase design to perform the filtering (see modules 5,6). Apply first the HP filter and then afterwards the LP filter. Plot the magnitude spectrum of the original signal and the filtered signal to check whether your filter worked. What do you think the spectral peaks are (Hint: check their frequencies)?def filter_signal(signal, FS=10000, order=8, Wn_LPF=1500, Wn_HPF=100, title=''):
# HPF
B, A = sig.butter(order, Wn_HPF, btype='highpass', analog=False, output='ba', fs=FS)
filtered_signal = filtfilt(B, A, signal)
# LPF
B, A = sig.butter(order, Wn_LPF, btype='lowpass', analog=False, output='ba', fs=FS)
filtered_signal = filtfilt(B, A, filtered_signal)
# FFT
fft_signal, fft_fitlered_signal = np.fft.fft(signal), np.fft.fft(filtered_signal)
freqs = np.fft.fftfreq(signal.size, 1/FS)
Amp_signal, Amp_filtered_signal = np.abs(fft_signal), np.abs(fft_fitlered_signal)
# plotting
# Amplitude response
fig, ax = plt.subplots(figsize=(12,4))
ax.plot(freqs, Amp_signal,color = "b")
ax.plot(freqs, Amp_filtered_signal,color = "k")
ax.set_title(title)
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Magnitude')
ax.set_xlim((0,1000))
ax.set_xticks(np.linspace(50,1000,20, dtype=int))
ax.set_xticklabels(np.linspace(50,1000,20, dtype=int))
ax.legend(["Original","Filtered"])
return filtered_signal
filtered_Ch1_100 = filter_signal(Ch1_100, FS=FS, title='Amplitude Response of original and filtered siginal : 100dB')
filtered_Ch1_80 = filter_signal(Ch1_80, FS=FS, title='Amplitude Response of original and filtered siginal : 80dB')
The spectral peaks occur on harmonics of 50Hz with the highest amplitude for the first harmonic. This indicates the presence of electrical noise since 50Hz is commonly associated with the AC power mains frequency. After filtering the amplitude of the first 50 Hz harmonic is massively reduced and for the second harmonic it is halved. For higher harmonics the amplitude reduction is minimal but the amplitudes on there own do not exceed the amplitude of the second harmonic even after filtering.
\end{tcolorbox}
After filtering, it is time to epoch your data into different trials such that you can perform an averaging across trials. The sample numbers at which a stimulus was presented, are given by the indx array. When epoching for this type of data, it is important to cut your signal 5 ms before the trigger (indx events) and analyse up to 20 ms after the trigger event.
Task
Count the number of events (stimulus clicks), and make a matrix with has dimensions (len(indx) x samples). The samples you should include correspond in each epoch correspond to a time start 5 ms before the click and runs 20 ms until after the click. Make a corresponding time axis, that has time 0 correspond to the time of the click. Make a figure that plots time vs all your epochs at once (you might have to transpose your matrix to plot using the .T expression).
This visualisation is helpful when determining a threshold above which you will remove "bad" or "noisy" epochs. Here, you will likely only remove 5 or 10 % of the noisiest epochs (the ones with the largest absolute amplitude). Write a code to automatically remove epochs with amplitudes above a reasonable "self-determined" threshold value. Discard the noisy epochs and now compare your epochs before and after noise-rejecting. Did your algorithm work?
Once you have rejected the noisy epochs, you can average the remaining epochs to obtain the mean ABR waveform (plotted over time). Do you notice the wave-V peak near 6-7 ms?
Perform the same preprocessing steps for the 80 dB ABR data.
def epoch_data(signal, indx, FS=10000, add_to_title=''):
t = 1000*np.arange(-0.005, 0.020, 1/FS)
epochs = np.zeros((len(indx)-1, len(t)))
ms_5 = int(FS*0.005)
ms_20 = int(FS*0.020)
fig, ax = plt.subplots(figsize=(12, 4))
for i, click in enumerate(indx[1:]):
epochs[i] = signal[click-ms_5: click+ms_20]
ax.plot(t, epochs[i])
ax.set_title('Epochs' + add_to_title)
ax.set_xlabel('Time [ms]')
ax.set_ylabel('Voltage [mV]')
return t, epochs
t_100, epochs_100 = epoch_data(filtered_Ch1_100, indx_100, FS=FS, add_to_title=' - 100dB')
t_80, epochs_80 = epoch_data(filtered_Ch1_80, indx_80, FS=FS, add_to_title=' - 80dB')
def remove_epochs_above_treshold(epochs, t=[],remove_fraction=0.1, add_to_title=''):
N = len(epochs)
epoch_max = np.zeros(N)
for i, epoch in enumerate(epochs):
epoch_max[i] = np.max(np.abs(epoch))
indices = np.argsort(epoch_max)
sorted_max = np.sort(epoch_max)
threshold = int(N*(1-remove_fraction))
passed_max = sorted_max[:threshold]
delete_indices = indices[threshold:]
passed_epochs = np.delete(epochs, delete_indices, axis=0)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize = (18,5))
ax1.set_title("Epochs pass threshold" + add_to_title)
ax1.plot(sorted_max)
ax1.plot(passed_max)
ax1.set_xlabel("The number of epoch")
ax1.set_ylabel("Epoch maxima")
ax1.legend(["Sorted epochs","passed epochs"])
ax2.set_title("Epochs after removing noisy epochs" + add_to_title)
for i in range(len(passed_epochs)):
if len(t) == 0:
ax2.plot(passed_epochs[i])
else:
ax2.plot(t, passed_epochs[i])
ax2.set_xlabel("Time[ms]")
ax2.set_ylabel("Voltage [mV]")
return passed_epochs
passed_epochs_100 = remove_epochs_above_treshold(epochs_100, t=t_100, add_to_title=' - 100dB')
passed_epochs_80 = remove_epochs_above_treshold(epochs_80, t=t_80, add_to_title=' - 80dB')
Yes it did. The signal does not exceed the amplitude of 10mV anymore.
\end{tcolorbox}
t = t_100
def calcutale_mean_ABR(epochs):
return np.mean(epochs, axis=0)
noisy_mean_ABR_100, mean_ABR_100 = calcutale_mean_ABR(epochs_100), calcutale_mean_ABR(passed_epochs_100)
noisy_mean_ABR_80, mean_ABR_80 = calcutale_mean_ABR(epochs_80), calcutale_mean_ABR(passed_epochs_80)
fig = plt.figure(figsize=(18,4))
ax = fig.add_subplot(121)
plt.plot(t,noisy_mean_ABR_100)
plt.plot(t,mean_ABR_100)
plt.title("100dB mean ABR waveform")
plt.xlabel("time [ms]")
plt.ylabel("Voltage [mV]")
plt.legend(["before rejection", "after rejection"])
ax = fig.add_subplot(122)
plt.plot(t,noisy_mean_ABR_80)
plt.plot(t,mean_ABR_80)
plt.title("80dB mean ABR wavefrom")
plt.xlabel("time [ms]")
plt.ylabel("Voltage [mV]")
plt.legend(["before rejection", "after rejection"])
plt.show()
fig = plt.figure(figsize=(18,4))
ax = fig.add_subplot(121)
plt.plot(t,mean_ABR_100 ,color = "r")
plt.plot(t,mean_ABR_80,color = "b")
plt.fill_betweenx([-0.3,0.3], 6, 7, color='purple', alpha=0.5)
plt.xlabel("time [ms]")
plt.ylabel("Voltage [mV]")
plt.legend(["100dB signal", "80DdB signal", "wave-V 6-7 ms"])
plt.show()
Yes the wave-V can be noticed and in the purple colored region.
\end{tcolorbox}
Task
Use the matrix with epochs (after artifact-rejection) to generate a 2.5th and 97.5th percentile waveform of the 80 or 100 dB ABR. The specific steps include, (i) calculate and store 500 resampled mean waveforms from the epoch matrix, by randomly sampling the epoch total (with replacement) from the matrix before you calculate the mean waveform (use np.random.choice). After this procedure, you have 500 mean resampled waveforms over time stored.
Plot your matrix with 500 mean ABR waveforms (use yourmatrixname.T in case there is a dimension mismatch with your time vector), to see that most mean ABR waveforms preserve the ABR waveform maximum near 5 ms.
Next, you can calculate the CI025 and CI975 waveforms out of the (500 x time) matrix with resampled mean signals. To do this (see also Module 6), you need to write a for loop that has an equal duration as your time vector (for n in range(0,len(time)):), and that for each sample, ranks the values from low to high. You can store the indices of ranked values using the np.argsort function. Once you did this, you have the indices corresponding to ascending order values for each time sample. Take the index numbers for each time-sample that correspond to the 2.5 and 97.5\%, respectively to generate the CI025 and CI975 waveforms.
For the 80 and 100 dB waveforms, plot the mean waveform (before bootstrapping) along with the waveforms of the confidence bounds on the same plot (with different colors for the 80 and 100 dB condition). You can also modify the following script to plot the confidence interval as a filled area around the mean waveform (but then it is graphically better to plot the results on two figures instead).
def generate_percentiles(epochs, t=[], add_to_tilte=''):
# calculate and store 500 resampled mean ABR waveforms
M, N, n = 500, len(epochs), len(epochs[0])
samples = np.zeros((M, n))
for i in range(M):
random_epochs = np.random.choice(N, N, replace=True)
samples[i,:] = np.mean(epochs[random_epochs], axis=0)
# plot the 500 resampled mean ABR waveforms
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(18, 4))
for i in range(M):
if len(t) == 0:
ax1.plot(samples[i])
else:
ax1.plot(t, samples[i])
ax1.set_xlabel('Time (ms)')
ax1.set_ylabel('Voltage (mV)')
ax1.set_title('Sampled ABR waveforms'+add_to_tilte)
# calculate CI025 and CI975
CI025, CI975 = np.zeros(n), np.zeros(n)
for i in range(n):
sorted_indices = np.argsort(samples[:, i])
CI025[i] = samples[sorted_indices[int(0.025*M)], i]
CI975[i] = samples[sorted_indices[int(0.975*M)], i]
#noise_floor = (CI975-CI025)/2
ax2.plot(t, np.mean(epochs, axis=0), label='mean ABR waveform')
ax2.fill_between(t, CI975, CI025, color='C0', alpha=0.5, label='confidence interval')
ax2.set_xlabel('Time (ms)')
ax2.set_ylabel('Voltage (mV)')
ax2.set_title('mean ABR waveform with confidence interval'+add_to_tilte)
ax2.legend()
return samples, CI025, CI975
samples_100, CI025_100, CI975_100 = generate_percentiles(passed_epochs_100, t=t, add_to_tilte=' - 100dB')
samples_80, CI025_80, CI975_80 = generate_percentiles(passed_epochs_80, t=t, add_to_tilte=' - 80dB')
Task
Now that we have calculated the confidence intervals on the two ABR waveforms, we are ready to perform the statistics to address our research questions:
To transform the samples of your epoch into latency, please note that you can use FS to go from sample number to the time domain, and that the onset of the stimulus corresponds to time 0, knowing that you started your epoch at -5 ms. Also convenient for your analysis is to know that the ABR wave-V peak corresponds to the maximum of the signal, hence it should be easy to calculate the wave-V response peak in a for-loop.
To address the wave-V peak question: first identify for the 80 and 100 dB signal, which sample number you want to consider for each response (i.e. the sample at which the wave-V maximum occurs in the mean ABR response). Then generate a normal sample of ABR wave-V maxima from your stored 500 resampled mean waveforms. Plot the histograms of your 500 wave-V amplitude draws for the 80 and 100 dB waveforms. Do they visually overlap? Perform a z or t-test to asses significance of your result. What is the zero-hypothesis for your test, and interpret the test outcome?
To address the latency question: in a for loop, find the index which corresponds to the maximum of each of the 500 resampled mean waveforms. You can then calculate a normal distribution of ABR wave-V latencies for each of the 80 dB and 100 dB waveforms. Plot a histogram of your latency estimate distributions and perform a t-test or z-test to formulate an answer to your research question.
def generate_normal_sample(samples, t):
M, n = len(samples), len(t)
indices_wave_V = np.argmax(samples, axis=1)
sample_wave_V_peak = samples[np.arange(M), indices_wave_V]
sample_wave_V_latency = np.zeros(n)
for i in range(n):
sample_wave_V_latency[i] = t[indices_wave_V[i]]
return sample_wave_V_peak, sample_wave_V_latency
wave_V_peak_100, wave_V_latency_100 = generate_normal_sample(samples_100, t)
wave_V_peak_80, wave_V_latency_80 = generate_normal_sample(samples_80, t)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))
ax1.hist(wave_V_peak_100, alpha=0.7, label='100dB', bins=10)
ax1.hist(wave_V_peak_80, alpha=0.7, label='80dB', bins=10)
ax1.set_title('Comparison of the wave-V peak amplitude')
ax1.set_xlabel('resampled wave-V peak')
ax1.set_ylabel('occurance out of 500 draws')
ax1.legend()
ax2.hist(wave_V_latency_100, alpha=0.7, label='100dB', bins=np.arange(-2,11))
ax2.hist(wave_V_latency_80, alpha=0.7, label='80dB', bins=np.arange(-2,11))
ax2.set_title('Comparison of the wave-V latency [mV]')
ax2.set_xlabel('resampled latency [ms]')
ax2.set_ylabel('occurance out of 500 draws')
ax2.legend()
<matplotlib.legend.Legend at 0x19b0b004308>
def t_test(x1, x2):
t, p = scipy .stats.ttest_ind(x1, x2)
print('t-test:')
print('-> t-statistic: ', t)
if p < 0.05:
significance = 'there is a significance difference'
else:
significance = 'there is no significance difference'
print('-> p-value: ', p, ' => ', significance)
print('H0: the amplitude of the wave-V peak for an ABR to 100 dB is the same as for an ABR to 80 dB')
print("H1: amplitude 100 dB > amplitude 80 dB")
print("to reject the hypothesis: p < 0.05 (single-sided test)")
t_test(wave_V_peak_100, wave_V_peak_80)
print()
print("H0: the latency of the wave-V peak for an ABR to 100 dB is the same as for an ABR to 80 dB")
print("H1: latency 100 dB < latency 80 dB")
print("to reject the hypothesis: p < 0.05 (single-sided test)")
t_test(wave_V_latency_100, wave_V_latency_80)
H0: the amplitude of the wave-V peak for an ABR to 100 dB is the same as for an ABR to 80 dB H1: amplitude 100 dB > amplitude 80 dB to reject the hypothesis: p < 0.05 (single-sided test) t-test: -> t-statistic: 18.72059740267643 -> p-value: 2.9543239981885773e-67 => there is a significance difference H0: the latency of the wave-V peak for an ABR to 100 dB is the same as for an ABR to 80 dB H1: latency 100 dB < latency 80 dB to reject the hypothesis: p < 0.05 (single-sided test) t-test: -> t-statistic: -6.814119985401665 -> p-value: 2.7472516796040592e-11 => there is a significance difference
As conclusion we can state that there is significant difference between the ABR generated by a 100dB and 80dB click. For 100dB the amplitude of the wave-V is higher then for 80dB and the latency for 100dB is smaller then for 80dB.
\end{tcolorbox}